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Abstract 


We present a novel algorithm, which is called Cutting Algorithm (CA), for 
improving the accuracy and reducing the computations of the Least Squares 
Support Vector Machines (LS-SVMs). The method is based on dividing the 
original problem to some subproblems. Since a master problem is converted 
to some small problems, so this algorithm has fewer computations. Although, 
in some cases that the typical LS-SVM cannot classify the dataset linearly, 
applying the CA the datasets can be classified. In fact, the CA improves the 
accuracy and reduces the computations. The reported and comparative re- 
sults on some known datasets and synthetics data demonstrate the efficiency 
and the performance of CA. 
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1 Introduction 


Support Vector Machines (SVMs) were introduced by Vapnik in 1995 [14, 15] 
within the area of statistical learning theory. SVMs are very popular and 
powerful in learning systems. Over the years, a variety of numerical optimiza- 
tion algorithms for SVM learning have been proposed [5, 12, 7]. However, 
these traditional algorithms may not be applicable for digital computers since 
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the computing time required for a solution is greatly dependent on the dimen- 
sion and the structure of the problem and the complexity of the algorithm 
used. LS-SVMs are least-squares versions of SVMs, which are a set of re- 
lated supervised learning methods that analyze data and recognize patterns, 
and which are used for classification and regression analysis. In this version, 
one finds the solution by solving a set of linear equations instead of a con- 
vex quadratic programming (QP) problem for classical SVMs. Least-squares 
SVM classifiers were proposed by Suykens and Vandewalle [13]. The LS-SVM 
is modified from SVM, which can be used to approximate the nonlinear sys- 
tem with higher accuracy [13, 3, 8, 16, 10, 4]. With better performance than 
SVM, LS-SVM model has been successfully applied in diverse fields, such 
as CO concentration, income, precipitation, wind speed, and so on. In the 
original space, the LS-SVM with equality constraints can be expressed as 
follows: 


l 
ee 2,C¢ 2 

min gle +2 

st. y= wy(a;)+tb+n, 1=1,2,...,1, (1) 


where S = {(21, y1), (2, y2),---, (£1, w)} is a set of | training samples, x; € 
R™ is an m-dimensional sample in the input space, y; € {—1, +1} is the class 
label of x;, w € R™ and 6 are weight vector and bias, respectively, and C 
is a positive and sufficiently large parameter and indicates the regularization 
parameter. Also, 7; indicates the slack variable. Inputs to the SVM system 
are the training data and the constant. The system will calculate proper slack 
variables 7; and will determine the separating hyperplane. Moreover, 7; is 
the training error corresponding to data sample z;. Also, the quantity Cn? is 
the “penalty” for any data point x; that cither lies within the margin on the 
correct side of the hyperplane 7; < 1 or on the wrong side of the hyperplane 
™m > 1. Increasing the values of slack variables, helps in reducing the effect 
of noisy support vectors. SVMs find the optimal separating hyperplane with 
the minimal classification errors. Let w and b denote the optimum values of 
the weight vector and bias, respectively. The hyperplane can be represented 
as: w'a+b=0, that w = [w1,we,...,Wm] and @; = [21;,22i,...,2mil} W 
is the normal vector of the hyperplane, and 5 is the bias. 


Using the nonlinear function y, the data are mapped from the input 
feature space to a higher-dimensional space. The Lagrange function of (1) 
similar to [1] can be built by 


l l 
1 C 
L(w,b,1, A) = slew’ a7 Soni t Sod (yi -— w.p(zi)-b-m), (2) 
1 i=l 


i= 


where A; denotes the Lagrange multiplier. The optimal point will be in the 
saddle point of the Lagrangian function, and then we obtain 
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I 
gh =0> w= dim Mv(zi), 
=U w= yg SU, 
a =)0=>\;= —CH;, 


oe =0> y-— w(x) —b-n = 0. 


(3) 


Thus, the optimization problem (1) can be transformed into the following 
linear system by eliminating the vectors w and 7; 


0 Qt jfo]_ fo 
QK+41] [A] [LY]? 
where Q = [1,...,1]7, A = [Ai,...,A]7, and Y = [y,...,y]7. Also, the 


kernel function can be set as: 


K(a;,%;) = ob" (xi) (a;). 


Two parameters are required in the LS-SVM model selection, which are 
the bandwidth of the Gauss radial basis kernel “o” and the regularization 
parameter “C”. In SVMs, the computational complexity always is a big 
problem in training stage for sparse data. These complexities reduce ac- 
curacy in SVMs. This problem is greater in LS-SVM; what that it is not 
sparse. Therefore, we should solve a system of linear equations to eliminate 
this problem. In this way, if the number of training data’s increases, then the 
computational complexity of the system of linear equations improves. How- 
ever, by using the (Cutting Algorithm) CA, we try to improve the efficiency 
and reduce the computations of the LS-SVM. 


Motivated by the former discussion, in this text, we propose a novel algo- 
rithm for solving this problem; we call this algorithm as CA. The CA reduces 
computations in training stage for variety of SVMs and also improves the ac- 
curacy. We use the CA besides LS-SVMs on training set of samples. Our 
idea in this algorithm is to break main problem to smaller problems and solve 
each of them separately. As we know, the LS-SVM cannot classify the nonlin- 
ear datascts linearly; however, we use the proposed algorithm to classify the 
nonlinear datasets, linearly as well. In addition, we compare the proposed 
method with some other known methods. 


The paper is organized as follows. Next section, the viewpoint of the CA 
is stated. We describe the proposed CA in this section. Section 3 explains 
the geometry illustration of the CA with one cut. The CA in general case is 
studied in Section 4. Section 5 investigates the CA in n-stage cut. In this 
section, we discuss some algorithms for the n-stage CA. The computational 
results are given in Section 6. Also, comparative results are obtained here. 
Finally, Section 7 states the conclusions. 
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2 Viewpoint of cutting algorithm (CA) 


In this section, we describe the CA. In fact, we find a hyperplane dividing the 
training set into two subsets. For finding this hyperplane, we call each sides 
of the hyperplane a dimension. Consequently, we choose two dimensions r 
and s. On any of these dimensions, by getting the average of vectors of any 
class, we may obtain (the average points are denoted by 2,54 and 2;s_) the 
Lps+ and the a,,— in the rs hyperplane. It is obvious that, the point 2,.+ 
is in the positive and the point x,,_ is in the negative class of vectors on 
the rs hyperplane. The passing line from z,,, and x;,- decomposes the 
positive class and the negative class into two disjoin parts. Therefore, it 
exists a hyperplane passing from the given two average points on R” and it 
is perpendicular on rs hyperplane. The equation of this hyperplane that its 
normal vector has two nonzero terms, can be considered as w,.%,+W.%s+tbm = 
0 in R”. This hyperplane divides the set S (the training set) into two training 
sets S,, and Sq as follows: 


Su = {(xi, Yi) eS | WrLir + WsLig + bm 2 O}, 
Sa = {(xi, Yi) ES | WrLir + WsLig + bm < O}. 


As stated before, we can obtain S, and Sq from S. As a matter of fact, we 
have the following subproblems: 


I 
} 1 CO 
min sll? +> om 

t=1 


st. ys—wip(ti)-—b=m, (ri, yi)E Su, %=1,2,...,u, (A) 


and 


la 
wen li te 
min lll + Sn 
i= 
8.t. Yi — W.p(x;) —b= Nis (xi, Yi) € Sa, t= 1,2, ue -y la, (5) 


where l,, and Jg are numbers corresponding to S, and Sq, respectively. Also, 
it is clear that |, +1lg = 1. In this approach, since the average is done 
on two dimensions, so by a little computation, we can find the desirable 
disjoin hyperplane, where its normal vector has two nonzero terms. Moreover, 
simplicity of this algorithm, simplifies the work of separating the training 
sets. The hyperplane dividing the training vector’s set is called the cutting 
hyperplane. 
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3 Geometry illustration of CA with one stage cut 


For the first step, we consider one cut. We are going to disjoint the problem 
into two parts. Also, we denote the training vector’s set in positive with X, 
and the negative class with X,, respectively. As a matter of fact, we have 


Xp = {xi|(%i,yi) € S, yi = +1}, (6) 
An ={el@p) es. ea —lp (7) 


By the former discussion, the following algorithm (Algorithm 1) is given as 
follows: 


Algorithm 1 The cutting algorithm in two dimension with one cut 


1: Input the training set S with / = 2. 

2: Obtaining the average X, and X,, and denote them by vectors z,,, and 
Lrs—, respectively. 

3: Obtaining the hyperplane w,,.z +b, = 0 that passes from the given two 
average points 2,54 and Zps5_. 

4: Determine two training sets S,, and Sq for one cut as follows: 


Su = {(i, yi) S S | Wm ti 7 bm = O}, 
Sa = {(Xi, Ys) es | WmXi + bm < O}. 


5: Using an SVM to determine decision functions f,,(x) = sign(gu(2)) and 
fa(x) = sign(ga(x)) for S,, and Sq, respectively. 
6: Output the decision function as follows: 


fa(£),  Wmt; + bm <0. 


oe mM 


For demonstrating the accuracy and the performance, we test CA to clas- 
sify some training set for some given (synthetics) data. Figures 1-6 show the 
results. In all results, we consider the following hypotheses: 


1. All of positive and negative classes have 100 elements. 


2. Symbols + and x refer to the positive and the negative classes, respec- 
tively. 


3. Line (0) is the separable line w.y(x) + b = 0. 


4. Line (—1) is the separable line w.p(x) + b = 1. 


5. Line (+1) is the separable line w.p(x) + 6 = +1. 
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6. For all problems, we set C' = 10. 


7. We used the Radius Basis Function (RBF) kernel with o = 1 in non- 
_ les; 


2 
linear problems (i.e., K(x, 2;) =e ==), 


Figure 1 shows the classification of some data with LS-SVM. In Figure 2 
for the same problem, we use the CA in LS-SVM to classify the data. One 
can check that the classification is done accurately. 


Figure 2: Geometric interpretation of LS-SVM in dominant problem by using CA 


Moreover, Figures 3-6 demonstrate the interpretation in order to classify 
for nonlinear problems. Figures 3 and 4 show the interpretation of a problem 
to classify with LS-SVM and LS-SVM with CA, respectively. Also, Figures 5 
and 6 show the interpretation of classification of another dominate problem 
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with LS-SVM and LS-SVM with CA, respectively. In none of the test prob- 
lems the LS-SVM cannot classify the data accurately, however, the LS-SVM 
with CA does that as well. 


(0) \ a * 


at +] 


Figure 4: Geometric interpretation of nonlinear LS-SVM in dominant problem by using 
CA 


4 CA with one cut in general status 


Here, we study the CA with one cut in general case. In this case, in high 
dimensions, again we get the average just. on two dimensions. Consequently, 
this work does not increase the computations. Therefore, the efficiency of 
this algorithm improves by increasing the training vector’s dimensions. The 
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Figure 5: Geometric interpretation of nonlinear LS-SVM in nondominant problem 


4s 


Figure 6: Geometric interpretation of nonlinear LS-SVM in nondominant problem by 
using CA 


proposed method is given in Algorithm 2. Note that, in Algorithm 2, we use 
one cut. Although, we can generally use more cuts. Of course, it depends on 
problem conditions. In addition, it is better to select the dimensions where 
the training vectors distributed uniformly on that dimension. 


5 CA with n-stage cuts 


In this section, the CA with n-stage cut is investigated. After one cut on 
training series T, it separates into two subsets To and T}. It is clear that, the 
subsets Tp and TJ; can be considered as new sets, so they can be cut again. 
We denote the sets obtained from cutting To by Too and 7o;. Also, the sets 
obtained from cutting T, is denoted by Tio and 7T,;. These four disjoint sets 
are on third stage. Do this procedure for n stage, we have the following sets: 
n—1, i1,72,...,¢m € {0, 1}. (8) 


1142...4m > peyeeny 
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Algorithm 2 The cutting algorithm in two dimensions with one stage cut in general 
case 


1: Input the training series S with an arbitrary | > 2. 

2: Choose two dimensions r and s of R”. 

3: Obtaining the average X, and X, in rs hyperplane and denote them by 
vectors X54 and 2,5, respectively. 

4: Obtaining the hyperplane w,.2;, + ws2is + by = 0 where passes from the 
given two average points 2.4 and 2ps_. 

5: Determine two training series S,, and Sg for one cut as follows: 


Su = Las Yi) eS | WrLir + WsLis + Bin 2 O}, 
Sa = flay, Yi) eS | WrLir + WsLis + bin < O}. 


6: Using an SVM to determine decision functions f,,(z) = sign(gu(x)) and 
fa(x) = sign(ga(x)) for S, and Sg, respectively. 
7: Output the decision function as follows: 


ult), WrLir WsLis bm 2 0, 
fe) = 4 
fa(x), WrLir T WsLis 1 bm <0. 


In the nth stage (final stage), we have two disjoint subsets T;,;,..,,0 and 
Tizig..imi- Therefore, we have 2” subsets in nth stage. If the cutting hyper- 
plane in the first stage is wo.x + bp = 0, then for any m= 1,2,...,n—1, the 
cutting hyperplane in the mth stage in the subset Tj,;,..;,, is as follows: 


Wipitio..im © + Dipitin...im =9, i909 = 9. (9) 


Also, the decision function for training subsets T; 
SVM can be obtained as follows: 


;, by using the following 


yg... 


n 


vy =f,(@), J= > 2", (10) 


k=1 


where Tj,;,..i, 1s one of the 2” numbers of T in the last stage. Now, the 
condition pi,i...i,(£) is define as follows: 


Pijziz...in (x) i 


Wide Ge sith bie ie oe 0) 4p = dT 
Wigtzin..tg_1-@ + Oigiyig.in_) <0, te =O. 


Finally, the decision function will be 
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fo; Po(%), Poo(x),..-, Poo--.o(x) 


n 


f(2) = Ee); (Di, (2), Dixég (x), +++ Dizio..in (x)) ’ j = am, 


k=1 
fon—1, p1(%),p11(%),.--, pir...1(x) 


We test CA for some synthetics data to show the accuracy and perfor- 
mance of the proposed method. For instance, Figure 7 shows the geometry 
interpretation of solving a classification problem by using CA in two stages 
and LS-SVM. We denote by “+” the points that belong to the positive class 
and also we denote by “x” the points that belong to the negative class. The 
problem is nondominant in the rate 0.5 (i.e., the positive class has 50 vectors 
and the negative class has 100 vectors). It is clearly seen that, this problem 
is not separable linearly and applying the linear LS-SVM has less accuracy. 
However, using the CA in LS-SVM has more accuracy. One can check from 
Figure 7 that, the CA is useful in two cases: The first one decreases the 
computational and the second one increases the accuracy. Moreover, Figure 
8 depicts the geometric interpretation of nonlinear LS-SVM on nondominant 
problem by using CA in three-stage cuts. In these synthetics data the LS- 
SVM cannot do well but applying the CA in LS-SVM the efficient results 
follow. 


Figure 7: Geometric interpretation of CA in three-stage cuts 


Remark 1. The aim of this paper is to reduce the computations in LS- 
SVM and increase the accuracy. By increasing the number of stages in CA, 
the accuracy is improves but the computations may be increase. For this 
purpose, increasing the stages in high cases are not recommended. 
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Figure 8: Geometric interpretation of nonlinear LS-SVM on nondominant problem by 
using CA in three-stage cuts 


6 Comparative results 


The proposed approach in this paper is investigated by several numerical ex- 
amples. All computations have been performed with symbolic computation 
software MATLAB and the calculations are implemented on a machine with 
Intel core 5 Duo processor 2 GHz and 4 GB RAM. We investigate the per- 
formance of the proposed approach and compare with the method without 
using the CA on some well-known data sets. Table 1 shows the properties 
of databases such as the number of samples for the corresponding set and 
the dimensions [11, 2]. We consider C = 10 in LS-SVM. Also, for nonlin- 
ear LS-SVM, we choose radius basis function kernel with 0 = 1. Tables 2 


Table 1: Properties of the benchmark data sets (number x dimension) 


Dataset BUPA liver Heart-Statlog Sonar Ionoshere Australian CMC 
No. x Dim. 345 x6 27014 208x60 351x34 69014 1473x9 


and 3 indicate the accuracy (percent) of accepted classification with linear 
LS-SVM and linear LS-SVM with CA, respectively. | Now, consider two 


Table 2: Accuracy of linear LS-SVM without CA 


Dataset BUPA liver Heart-Statlog Sonar lIonoshere Australian CMC 
Accuracy (%) |} 70.43 84.81 87.50 90.03 86.09 68.36 


synthetic datasets that cannot classify with a linear LS-SVM. However, we 
can use the proposed CA to classify the dataset with linear LS-SVM. This 
is another advantage of the proposed algorithm. In Figures 9 and 10, the 
dash line is the LS-SVM classifying the dataset wrongly, but by applying the 


44 M. Baymani and A. Mansoori 


Table 3: Accuracy of linear LS-SVM with CA 


Dataset BUPA liver Heart-Statlog Sonar  lJIonoshere Australian CMC 
Accuracy (%) || 71.01 87.04 90.87 92.02 86.52 71.35 


Figure 9: The dash-line is LS-SVM which is wrong but the CA classify the dataset 
correctly 


CA, the dataset is classified in one hand correctly and on the other hand lin- 
early. For more comparison, consider a synthetic dataset which neither the 
LS-SVM can classify nor TW-SVM (Twin Support Vector Machine). Kumar 
and Gopal [6] proposed a least squares twin SVM (LS-TW-SVM) for pattern 
classification. They performed their approach on a synthetic dataset and the 
result is shown in Figure 11. It shows that their approach is not effective to 
classify the data. Also, we perform the LS-SVM and the LS-SVM with CA 
on another synthetic data like the one in [6]. The result is shown in Figure 
ivy 


Figure 10: The dash-line is LS-SVM which is wrong but the CA classify the dataset 
correctly 
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Figure 11: Classification results of LS-T-SVM 


cutting line 


Figure 12: Classification results of LS-SVM and LS-SVM with CA 


One can see that, the LS-SVM cannot classify the data. However, the LS- 
SVM with CA does as well. Table 4 shows the comparison of classification 
accuracy for LS-SVM, LS-TW-SVM with TW-SVM, GEP-SVM (GEP-SVM: 
generalized eigenvalue proximal SVM [9]), and LS-SMM with CA on 6 UCI 
datasets. Table 4 shows that the generalization capability of LS-SVM with 
CA is better than the other methods on many of the datasets considered. 
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Table 4: Comparison in accuracy for linear kernel 


Dataset | LS-SVM LS-T-SVM T-SVM GEP-SVM  LS-SVM with CA 
Bupa Liver 70.43 70.90 70.5 66.36 71.01 
Heart-statlog | 84.81 85.55 86.66 85.55 87.04 
Sonar 87.50 80.47 80.52 79.47 90.87 
Ionosphere 90.03 89.70 88.23 84.11 92.02 
Australian 86.09 86.61 86.91 80.00 86.52 
CMC 68.36 68.84 68.84 68.76 71.35 


7 Conclusions 


In this paper, a new algorithm for reducing the computations and improving 
the accuracy of the LS-SVM was given. We called this algorithm as Cut- 
ting Algorithm (CA). In that, by using some cuts, we tried to reduce the 
training stage and therefore reducing the computations. In fact, we broke 
the original problem into smaller subproblems. By solving the subproblems 
the original problem was solved. We tested the proposed algorithm on some 
known datasets. In addition, we showed that the proposed CA can classify 
the nonlinear datasets, linearly. The reported results showed that the accu- 
racy and the efficiency of the approach. Finally, the work is in progress to 
extend the approach to solve this problem by neural network models. 
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